Skip to content

[SPARK-55128][INFRA] Restore SQL tests by pin 'pandas==2.3.3'#53910

Closed
zhengruifeng wants to merge 2 commits intoapache:masterfrom
zhengruifeng:restore_sql
Closed

[SPARK-55128][INFRA] Restore SQL tests by pin 'pandas==2.3.3'#53910
zhengruifeng wants to merge 2 commits intoapache:masterfrom
zhengruifeng:restore_sql

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jan 22, 2026

What changes were proposed in this pull request?

Restore Restore SQL tests by pin 'pandas<3'

Why are the changes needed?

pandas 3 is just released, and fail sql tests

https://github.com/apache/spark/actions/runs/21232213791/job/61092886134

currently pandas 3 doesn't affect python tests too much:
1, in dev/requirements.txt, the latest mlflow==3.8.1 requires: pandas<3
2, pandas==2.3.3 is pinned in most places

Does this PR introduce any user-facing change?

no

How was this patch tested?

ci

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions
Copy link

JIRA Issue Information

=== Test SPARK-55128 ===
Summary: Restore SQL tests by pin 'pandas<3'
Assignee: None
Status: Open
Affected: ["4.2"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions bot added the INFRA label Jan 22, 2026
@zhengruifeng zhengruifeng changed the title [SPARK-55128][INFRA] Restore Restore SQL tests by pin 'pandas<3' [SPARK-55128][INFRA] Restore SQL tests by pin 'pandas<3' Jan 22, 2026
@zhengruifeng zhengruifeng changed the title [SPARK-55128][INFRA] Restore SQL tests by pin 'pandas<3' [SPARK-55128][INFRA] Restore SQL tests by pin 'pandas==2.3.3' Jan 22, 2026
@zhengruifeng
Copy link
Contributor Author

if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) || contains(matrix.modules, 'connect') || contains(matrix.modules, 'yarn')
run: |
python3.11 -m pip install 'numpy>=1.22' pyarrow pandas pyyaml scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'zstandard==0.25.0'
python3.11 -m pip install 'numpy>=1.22' pyarrow 'pandas==2.3.3' pyyaml scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'zstandard==0.25.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only place where the version needs to be pinned? For example, is it not necessary in other places like requirements.txt?

Copy link
Contributor Author

@zhengruifeng zhengruifeng Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file requirements.txt is only for developer, we may want to dev/test with pandas 3.0 in local env,
and current mlflow requires pandas<3

zhengruifeng added a commit that referenced this pull request Jan 22, 2026
### What changes were proposed in this pull request?
Restore Restore SQL tests by pin 'pandas<3'

### Why are the changes needed?
pandas 3 is just released, and fail sql tests

https://github.com/apache/spark/actions/runs/21232213791/job/61092886134

currently pandas 3 doesn't affect python tests too much:
1, in `dev/requirements.txt`, the latest `mlflow==3.8.1` requires: `pandas<3`
2, `pandas==2.3.3` is pinned in most places

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #53910 from zhengruifeng/restore_sql.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
(cherry picked from commit dafb2cd)
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@zhengruifeng
Copy link
Contributor Author

merged to master/4.1

@zhengruifeng zhengruifeng deleted the restore_sql branch January 22, 2026 07:22
zhengruifeng pushed a commit that referenced this pull request Jan 23, 2026
…3' for maven daily test

### What changes were proposed in this pull request?
Similar to #53910, this pr pins the pandas version to 2.3.3.

### Why are the changes needed?
To  restore SQL tests for maven daily test.
- https://github.com/apache/spark/actions/runs/21249870076/job/61148348328

```
- udf/postgreSQL/udf-case.sql - Scalar Pandas UDF *** FAILED ***
  udf/postgreSQL/udf-case.sql - Scalar Pandas UDF
  Python: 3.11 Pandas: 3.0.0 PyArrow: 23.0.0
  Expected Some("struct<Two:string,i:int,f:double,i:int,j:int>"), but got Some("struct<>") Schema did not match for query #30
  SELECT '' AS `Two`, *
    FROM CASE_TBL a, CASE2_TBL b
    WHERE udf(COALESCE(f,b.i) = 2): -- !query
  SELECT '' AS `Two`, *
    FROM CASE_TBL a, CASE2_TBL b
    WHERE udf(COALESCE(f,b.i) = 2)
  -- !query schema
  struct<>
  -- !query output
  org.apache.spark.SparkRuntimeException
  {
    "errorClass" : "CAST_INVALID_INPUT",
    "sqlState" : "22018",
    "messageParameters" : {
      "ansiConfig" : "\"spark.sql.ansi.enabled\"",
      "expression" : "'nan'",
      "sourceType" : "\"STRING\"",
      "targetType" : "\"BOOLEAN\""
    },
    "queryContext" : [ {
      "objectType" : "",
      "objectName" : "",
      "startIndex" : 62,
      "stopIndex" : 85,
      "fragment" : "udf(COALESCE(f,b.i) = 2)"
    } ]
  } (SQLQueryTestSuite.scala:681)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
monitor maven daily test after pr merged

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53933 from LuciferYang/SPARK-55128-FOLLOWUP.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
LuciferYang added a commit that referenced this pull request Jan 23, 2026
…3' for maven daily test

### What changes were proposed in this pull request?
Similar to #53910, this pr pins the pandas version to 2.3.3.

### Why are the changes needed?
To  restore SQL tests for maven daily test.
- https://github.com/apache/spark/actions/runs/21249870076/job/61148348328

```
- udf/postgreSQL/udf-case.sql - Scalar Pandas UDF *** FAILED ***
  udf/postgreSQL/udf-case.sql - Scalar Pandas UDF
  Python: 3.11 Pandas: 3.0.0 PyArrow: 23.0.0
  Expected Some("struct<Two:string,i:int,f:double,i:int,j:int>"), but got Some("struct<>") Schema did not match for query #30
  SELECT '' AS `Two`, *
    FROM CASE_TBL a, CASE2_TBL b
    WHERE udf(COALESCE(f,b.i) = 2): -- !query
  SELECT '' AS `Two`, *
    FROM CASE_TBL a, CASE2_TBL b
    WHERE udf(COALESCE(f,b.i) = 2)
  -- !query schema
  struct<>
  -- !query output
  org.apache.spark.SparkRuntimeException
  {
    "errorClass" : "CAST_INVALID_INPUT",
    "sqlState" : "22018",
    "messageParameters" : {
      "ansiConfig" : "\"spark.sql.ansi.enabled\"",
      "expression" : "'nan'",
      "sourceType" : "\"STRING\"",
      "targetType" : "\"BOOLEAN\""
    },
    "queryContext" : [ {
      "objectType" : "",
      "objectName" : "",
      "startIndex" : 62,
      "stopIndex" : 85,
      "fragment" : "udf(COALESCE(f,b.i) = 2)"
    } ]
  } (SQLQueryTestSuite.scala:681)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
monitor maven daily test after pr merged

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53933 from LuciferYang/SPARK-55128-FOLLOWUP.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
(cherry picked from commit 3f1c9a3)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@pan3793
Copy link
Member

pan3793 commented Feb 11, 2026

@zhengruifeng I guess this is also required for branch-4.0, it seems broken for a while

image

pan3793 pushed a commit to pan3793/spark that referenced this pull request Feb 11, 2026
Restore Restore SQL tests by pin 'pandas<3'

pandas 3 is just released, and fail sql tests

https://github.com/apache/spark/actions/runs/21232213791/job/61092886134

currently pandas 3 doesn't affect python tests too much:
1, in `dev/requirements.txt`, the latest `mlflow==3.8.1` requires: `pandas<3`
2, `pandas==2.3.3` is pinned in most places

no

ci

no

Closes apache#53910 from zhengruifeng/restore_sql.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
pan3793 pushed a commit to pan3793/spark that referenced this pull request Feb 11, 2026
…3' for maven daily test

Similar to apache#53910, this pr pins the pandas version to 2.3.3.

To  restore SQL tests for maven daily test.
- https://github.com/apache/spark/actions/runs/21249870076/job/61148348328

```
- udf/postgreSQL/udf-case.sql - Scalar Pandas UDF *** FAILED ***
  udf/postgreSQL/udf-case.sql - Scalar Pandas UDF
  Python: 3.11 Pandas: 3.0.0 PyArrow: 23.0.0
  Expected Some("struct<Two:string,i:int,f:double,i:int,j:int>"), but got Some("struct<>") Schema did not match for query apache#30
  SELECT '' AS `Two`, *
    FROM CASE_TBL a, CASE2_TBL b
    WHERE udf(COALESCE(f,b.i) = 2): -- !query
  SELECT '' AS `Two`, *
    FROM CASE_TBL a, CASE2_TBL b
    WHERE udf(COALESCE(f,b.i) = 2)
  -- !query schema
  struct<>
  -- !query output
  org.apache.spark.SparkRuntimeException
  {
    "errorClass" : "CAST_INVALID_INPUT",
    "sqlState" : "22018",
    "messageParameters" : {
      "ansiConfig" : "\"spark.sql.ansi.enabled\"",
      "expression" : "'nan'",
      "sourceType" : "\"STRING\"",
      "targetType" : "\"BOOLEAN\""
    },
    "queryContext" : [ {
      "objectType" : "",
      "objectName" : "",
      "startIndex" : 62,
      "stopIndex" : 85,
      "fragment" : "udf(COALESCE(f,b.i) = 2)"
    } ]
  } (SQLQueryTestSuite.scala:681)
```

No

monitor maven daily test after pr merged

No

Closes apache#53933 from LuciferYang/SPARK-55128-FOLLOWUP.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
pan3793 pushed a commit that referenced this pull request Feb 11, 2026
Backport #53910 and #53933 to branch-4.0

### What changes were proposed in this pull request?

Pandas 3.0 released, pin 'pandas==2.3.3' to recover the CI.

### Why are the changes needed?

Recover CI.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Wait for GHA result.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54263 from pan3793/SPARK-55128-4.0.

Lead-authored-by: Ruifeng Zheng <ruifengz@apache.org>
Co-authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants